Code for implementing iMAP is designed with four bundles of commands wrapped individually in bash driver scripts for performing bioinformatic analysis of microbiome data (Figure 1). The output is then transformed into data structure suitable for conducting exploratory visualization. A progress report is generated at each major analysis step to summarize the output.
Schematic illustration of the iMAP pipeline.
The first step is to gather all materials needed for implementing the iMAP pipeline smoothly (Table S1).
| Required | Description | Folder | Remarks |
|---|---|---|---|
| iMAP pipeline | Bundled scripts for comprehensive microbiome analysis | iMAP | Link |
| Hardware | Computer with multi-core processor: preferably 64-bit. | ||
| Remote Accessory Memory (RAM): 8 GB minimum. | |||
| Storage: Tens of gigabytes for small dataset otherwise a few terabytes | |||
| Raw data | Demultiplexed reads in FASTQ format with primers and barcodes removed | data/references | |
| Sample metadata | A tab-separated file showing sample identifiers, categorical and numeric variables | data/metadata | |
| Mapping file | A file that links sample IDs (1st column) to the names of forward (2nd column) and reverse (3rd column) data files | ||
| Design files | Files that assign samples to a specific variables or other categories | ||
| Software | |||
| sekit | For inspecting rawdata format and simple statistics | code | Link |
| FASTQc | For creating base call quality score images and statistics | code | Link |
| bbmap_bbduk | For trimming poor quality reads | code | Link |
| multiqc | For summarizing FASTQc output | Link | |
| Mothur | For sequence processing and classifying the sequences and preliminary analysis | code | Link |
| Statistical analysis and visualization | |||
| R | For statistical analysis and visualization | Link | |
| Rstudio | An IDE (integrated development environment) for R | Link | |
| iTOL | For display, annotation and management of phylogenetic trees | Link | |
| Reference 16S rRNA gene alignments | |||
| SILVA (nr) | Reference rRNA alignments | data/references | Link |
| Reference 16S rRNA gene classifiers | |||
| SILVA(no gap) | Degapped using degap.seqs function in Mothur | data/references | Link |
| RDP | Mothur-formatted | data/references | Link |
| Greengenes | Mothur-formatted | data/references | Link |
| EzBioCloud | Mothur-formatted | data/references | Link |
| Custom classifiesr | Any manually built classifiers | ||
git clone https://github.com/tmbuza/iMAP.git
cd iMAP
# OR
curl -LOk https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
cd iMAP
# OR
wget --no-check-certificate https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
cd iMAP
# Mac
bash ./code/requirements/iMAP_requirements_mac_driver.bash
# Linux
bash ./code/requirements/iMAP_requirements_linux_driver.bash
bash ./code/requirements/iMAP_checkFiles_driver.bash
Figure S1: Major folders in the iMAP root directory. Folders and files marked with ✅ exist. Missing file marked ❌ must be found by the above script before proceeding.
This is basically a method where users sequentially run individual or bundle scripts on CLI (Command -Line_Interface) one at a time. We have bundled workflow-specific scripts into a driver to make the analysis easily implemented on CLI by just a single click.
bash ./code/requirements/iMAP_requirements_mac_driver.bash
bash ./code/requirements/iMAP_checkFiles_driver.bash
bash ./code/preprocessing/iMAP_preprocessing_driver.bash
bash ./code/summarizeFastQC/iMAP_multiqc_driver.bash
bash ./code/mockcommunity/iMAP_mockcommunity_driver.bash
bash ./code/seqprocessing/iMAP_seqprocessing_driver.bash
bash ./code/seqclassification/iMAP_seqclassification_driver.bash
bash ./code/seqerrorrate/iMAP_seqerrorrate_driver.bash
bash ./code/otutaxonomy/iMAP_otutaxonomy_driver.bash
bash ./code/annotation/01_processed_seqs.bash
bash ./code/dataanalysis/iMAP_dataanalysis_demo_driver.bash
The iMAP_driver.bash is the master driver for running all analyses on CLI at once.
bash ./code/iMAP_driver.bash
Users must create a Portable Batch System (PBS) script that describes cluster resources to be used, parameters for the job and the commands to be executed. The following is a PBS script for running executing iMAP pipeline remotely. Note that you must provide the group allocation name (-A) but this may differ from one system to the other. Google for help just in case.
Individual driver
#!/bin/bash -f
#PBS iMAPtest
#PBS -A group allocation name
#PBS -l nodes=1:ppn=8
#PBS -l walltime=4000:00:00
#PBS -l pmem=20gb
#PBS -j oe
#PBS -o iMAPtest.log
#PBS -m abe
#PBS -M tmb72@psu.edu
cd $PBS_O_WORKDIR
bash ./code/requirements/iMAP_requirements_mac_driver.bash
bash ./code/requirements/iMAP_checkFiles_driver.bash
bash ./code/preprocessing/iMAP_preprocessing_driver.bash
bash ./code/summarizeFastQC/iMAP_multiqc_driver.bash
bash ./code/mockcommunity/iMAP_mockcommunity_driver.bash
bash ./code/seqprocessing/iMAP_seqprocessing_driver.bash
bash ./code/seqclassification/iMAP_seqclassification_driver.bash
bash ./code/seqerrorrate/iMAP_seqerrorrate_driver.bash
bash ./code/otutaxonomy/iMAP_otutaxonomy_driver.bash
bash ./code/annotation/01_processed_seqs.bash
bash ./code/dataanalysis/iMAP_dataanalysis_demo_driver.bash
Batch mode
#!/bin/bash -f
#PBS iMAPtest
#PBS -A group allocation name
#PBS -l nodes=1:ppn=8
#PBS -l walltime=4000:00:00
#PBS -l pmem=20gb
#PBS -j oe
#PBS -o iMAPtest.log
#PBS -m abe
#PBS -M tmb72@psu.edu
cd $PBS_O_WORKDIR
bash code/iMAP_driver.bash
Status of metadata
| variable | q_zeros | p_zeros | q_na | p_na | q_inf | p_inf | type | unique |
|---|---|---|---|---|---|---|---|---|
| SampleID | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 360 |
| BarcodeSequence | 0 | 0.00 | 360 | 100 | 0 | 0 | character | 0 |
| ForwardPrimerSequence | 0 | 0.00 | 360 | 100 | 0 | 0 | character | 0 |
| ReversePrimerSequence | 0 | 0.00 | 360 | 100 | 0 | 0 | character | 0 |
| ForwardRead | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 360 |
| ReverseRead | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 360 |
| origGroup | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 360 |
| Sex | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 2 |
| Time | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 2 |
| DayID | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 35 |
| DPW | 12 | 3.33 | 0 | 0 | 0 | 0 | integer | 35 |
| Description | 0 | 0.00 | 0 | 0 | 0 | 0 | character | 1 |
Key: q_zeros: quantity of zeros or missing data ; q_na: quantity of NA; q_inf: quantity of infinite values; type: factor, character, integer or numeric..; unique: quantity of unique values. Percentages are indicated by p_.
When reviewing metadata status you may notice that numeric variables are confused with numeric data. For example, variable DPW (days post weaning) on Table x shows 12 missing values (3.33%) which is incorrect. As investigators we know that day 1 was coded zero (0) (12 zeros for 12 mouse) which in descriptive statistics is interpreted as missing data. To correct this we re-coded the samples with unique identifies shown as DayID to distinguish them from integers. However, depending of experiment this kind of variables need to be converted to character during analysis.
First 5 lines of mapping metadata
| SampleID | origGroup | Sex | Time | DayID | DPW |
|---|---|---|---|---|---|
| F3D000 | F3D0 | Female | Early | D000 | 0 |
| F3D001 | F3D1 | Female | Early | D001 | 1 |
| F3D002 | F3D2 | Female | Early | D002 | 2 |
| F3D003 | F3D3 | Female | Early | D003 | 3 |
| F3D005 | F3D5 | Female | Early | D005 | 5 |
Last 5 lines of mapping metadata
| SampleID | origGroup | Sex | Time | DayID | DPW |
|---|---|---|---|---|---|
| M6D148 | M6D148 | Male | Late | D148 | 148 |
| M6D149 | M6D149 | Male | Late | D149 | 149 |
| M6D150 | M6D150 | Male | Late | D150 | 150 |
| M6D175 | M6D175 | Male | Late | D175 | 175 |
| M6D364 | M6D364 | Male | Late | D364 | 364 |
Figure 3. Frequency of experimental variables
Number of forward reads before QC (Original_R1): 3634461 ( 100 % )
Number of reverse reads before QC (Original_R2): 3634461 ( 100 % )
Total number of reads before QC: 7268922
Figure x: ensity and histograms plots showing forward and reverse read length. Dotted line indicates mean value for the specified variable
Grouped by Sex variable
Figure x: Barplots of pre-processed reads grouped by sex variable.
Grouped by Time variable
Figure x: Barplots of pre-processed reads grouped by time variable
Grouped by days-post-weaning (D) variable
Figure x: Barplots of pre-processed reads grouped by time variable
open ./results/multiqc/qc0/multiqc_report.html
open ./results/multiqc/qc0/R1/multiqc_report.html
open ./results/multiqc/qc0/R2/multiqc_report.html
Figure x: Example of report generated by running open ./results/multiqc/qc0/multiqc_report.html on CLI
open ./results/multiqc/qctrim25/multiqc_report.html
open ./results/multiqc/qctrim25/R1/multiqc_report.html
open ./results/multiqc/qctrim25/R2/multiqc_report.html
Figure x: Example of report generated by running open ./results/multiqc/qced/R2/multiqc_report.html on CLI
open ./results/multiqc/qced/multiqc_report.html
open ./results/multiqc/qced/R1/multiqc_report.html
open ./results/multiqc/qced/R2/multiqc_report.html
Figure x: Example of report generated by running open ./results/multiqc/qced/R2/multiqc_report.html on CLI
| variable | q_zeros | p_zeros | q_na | p_na | q_inf | p_inf | type | unique |
|---|---|---|---|---|---|---|---|---|
| SampleID | 0 | 0 | 0 | 0 | 0 | 0 | character | 360 |
| origGroup | 0 | 0 | 0 | 0 | 0 | 0 | character | 360 |
| Sex | 0 | 0 | 0 | 0 | 0 | 0 | character | 2 |
| Time | 0 | 0 | 0 | 0 | 0 | 0 | character | 2 |
| DayID | 0 | 0 | 0 | 0 | 0 | 0 | character | 35 |
| Original_R1 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 357 |
| TrimQ25_R1 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 357 |
| NophiX_R1 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 359 |
| Original_R2 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 357 |
| TrimQ25_R2 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 357 |
| NophiX_R2 | 0 | 0 | 0 | 0 | 0 | 0 | numeric | 359 |
Key: q_zeros: quantity of zeros or missing data ; q_na: quantity of NA; q_inf: quantity of infinite values; type: factor, character, integer or numeric..; unique: quantity of unique values. Percentages are indicated by p_.
Figure x: Stacked barplots of pre-processed reads grouped by QC variable
Figure x: Boxplot of pre-processed reads plotted with overplotting points (A) and jitter points (B).
Number of forward reads before QC (Original_R1): 3634461 ( 100 % )
Number of reverse reads before QC (Original_R2): 3634461 ( 100 % )
Number of forward reads after trimming at Q=25 (TrimQ25_R1): 3631940 ( 99.93064 % )
Number of reverse reads after trimming at Q=25 (TrimQ25_R2): 3631940 ( 99.93064 % )
Number of forward reads after phiX removal (NophiX_R1): 3631769 ( 99.92593 % )
Number of reverse reads after phiX removal (NophiX_R2): 3631769 ( 99.92593 % )
| Original_R1 | TrimQ25_R1 | NophiX_R1 | Original_R2 | TrimQ25_R2 | NophiX_R2 | |
|---|---|---|---|---|---|---|
| Min. : 14 | Min. : 14 | Min. : 14 | Min. : 14 | Min. : 14 | Min. : 14 | |
| 1st Qu.: 5368 | 1st Qu.: 5365 | 1st Qu.: 5365 | 1st Qu.: 5368 | 1st Qu.: 5365 | 1st Qu.: 5365 | |
| Median : 8001 | Median : 7996 | Median : 7996 | Median : 8001 | Median : 7996 | Median : 7996 | |
| Mean :10096 | Mean :10089 | Mean :10088 | Mean :10096 | Mean :10089 | Mean :10088 | |
| 3rd Qu.:13636 | 3rd Qu.:13630 | 3rd Qu.:13630 | 3rd Qu.:13636 | 3rd Qu.:13630 | 3rd Qu.:13630 | |
| Max. :40113 | Max. :40077 | Max. :40077 | Max. :40113 | Max. :40077 | Max. :40077 |
The clean paired-end sequences that passed the preprocessing trimming and filtering were processed and classified using mothur-based phylotype and OTU-based approaches as described [@Schloss2009]. This step enables investigators to review merged sequences to determine if expected length is in-line with the targeted 16S rRNA gene region. Lack of good overlapping region could results into too many sequences that are wrong and the effect will be propagated in the downstream analysis. High quality sequences align in the same region when searched against a reference template.
| Length | Overlap_Length | MisMatches | |
|---|---|---|---|
| Min. : 14 | Min. : 0.0 | Min. : 0.000 | |
| 1st Qu.:252 | 1st Qu.:102.0 | 1st Qu.: 0.000 | |
| Median :252 | Median :153.0 | Median : 0.000 | |
| Mean :241 | Mean :137.7 | Mean : 1.385 | |
| 3rd Qu.:253 | 3rd Qu.:173.0 | 3rd Qu.: 0.000 | |
| Max. :500 | Max. :250.0 | Max. :116.000 |
| QueryLength | PairwiseAlignmentLength | SimBtwnQuery&Template | |
|---|---|---|---|
| Min. : 16.0 | Min. : 1.0 | Min. : 50.00 | |
| 1st Qu.:252.0 | 1st Qu.:252.0 | 1st Qu.: 96.05 | |
| Median :253.0 | Median :253.0 | Median : 98.41 | |
| Mean :244.4 | Mean :244.5 | Mean : 96.19 | |
| 3rd Qu.:253.0 | 3rd Qu.:253.0 | 3rd Qu.: 99.21 | |
| Max. :310.0 | Max. :315.0 | Max. :100.00 |
Figure x: Stacked barplots of processed sequences grouped by processing variable
In summary the taxonomy data was generated with high stringent quality control parameters. Initially, 2,920,782 quality-checked sequences were classified at 80% identity against 16S rRNA gene sequences deposited in version 132 of SILVA rRNA database as described in the previous section. In OTUbased approach in particular, the clustering done by opticlust algorithm in-built in-built in mothur yielded reliable output based on the high metrics generated and a low FDR = 0.002 obtained (Table 1).
| Parameter | Value |
|---|---|
| cutoff | 0.2 |
| Sensitivity | 0.998 |
| Specificity | 0.999 |
| PPV | 0.998 |
| NPV | 0.999 |
| Accuracy | 0.999 |
| MCC | 0.997 |
| F1score | 0.998 |
| FDR | 0.002 |
Figure x: Unfiltered and curated OTU abundance. Visual representation of taxon terms highlight the most abundant taxon based on frequency of being assigned to an OTU or tree nodes. Muribaculaceae is the most frequently assigned family and Muribaculaceae_ge is the most frequent species assigned to most sequences.
OTUs
[1] "Otu001" "Otu002" "Otu003" "Otu004" "Otu005" "Otu006" "Otu007"
[8] "Otu008" "Otu010" "Otu011" "Otu012" "Otu013" "Otu014" "Otu016"
[15] "Otu017" "Otu019" "Otu020" "Otu021" "Otu022" "Otu023" "Otu024"
[22] "Otu026" "Otu028" "Otu029" "Otu031" "Otu032" "Otu037" "Otu038"
[29] "Otu041" "Otu045"
Phylum
[1] "Bacteroidetes" "Firmicutes"
Classn
[1] "Bacilli" "Bacteroidia" "Clostridia"
[4] "Erysipelotrichia"
Order
[1] "Anaeroplasmatales" "Bacteroidales" "Bifidobacteriales"
[4] "Clostridiales" "Erysipelotrichales" "Lactobacillales"
[7] "Mollicutes_RF39" "Verrucomicrobiales"
Family
[1] "Akkermansiaceae" "Bacteroidaceae"
[3] "Bifidobacteriaceae" "Clostridiaceae_1"
[5] "Clostridiales_vadinBB60_group" "Erysipelotrichaceae"
[7] "Lachnospiraceae" "Lactobacillaceae"
[9] "Mollicutes_RF39_fa" "Muribaculaceae"
[11] "Peptococcaceae" "Rikenellaceae"
[13] "Ruminococcaceae"
Genus
[1] "Acetatifactor" "Akkermansia"
[3] "Alistipes" "Anaerotruncus"
[5] "Bacteroides" "Bifidobacterium"
[7] "Candidatus_Arthromitus" "Clostridiales_vadinBB60_group_ge"
[9] "Lachnoclostridium" "Lachnospiraceae_NK4A136_group"
[11] "Lachnospiraceae_UCG.001" "Lactobacillus"
[13] "Mollicutes_RF39_ge" "Muribaculaceae_ge"
[15] "Oscillibacter" "Roseburia"
[17] "Ruminiclostridium" "Ruminiclostridium_5"
[19] "Ruminiclostridium_9" "Ruminococcaceae_ge"
[21] "Turicibacter"
Figure x. Rank abundance of eight selected samples. Package: goeveg
Figure x. Correlation between species identified at phylum-level. Species are ordered alphabetically (top panel) and heuristically (bottom panel)
Figure x: Stacked barplots for species richness. The estimated richness (green bars) was calculated using chao calculator and observed ichness (red bars) was calculated using sobs.
Figure x: Species richness (observed species) displayed by boxplot (A), density plots (B) and histograms (C).
Figure x: Correlation between species richness and sequence depth. Observed species calculated using sobs (A) and estimated species richness by chao calculator (B).
Figure x: Species diversity and correlation to species richness. Definitely phylo-diversity (C) correlates well with the species richness.
Figure x: Species diversity estimates as a function of sample size. Only species with abundance greater or equal to 1 are detected in the sample.
Figure x. Rarefaction and extrapolation curves. Sample-size-based curve (A), sample completeness curve (B), Coverage‐based curves (C).
OTUbased
Number_clusters Value_Index
2.0000 22.3718
Phylum
Number_clusters Value_Index
2.0000 114.5654
Class
Number_clusters Value_Index
2.0000 73.9034
Order
Number_clusters Value_Index
2.00 42.17
Family
Number_clusters Value_Index
2.000 31.017
Genus
Number_clusters Value_Index
2.000 24.511
Figure x: Optimal number of OTU clusters. The suggested number of best clusters (dotted line) thta could expllain most variation is 2 for OTUs (A), 3 for phylum (B), 3 for class (C), 2 for Order (D), 10 for Family (E) and 2 for Genus (F). A high average silhouette width indicates high quality clustering.
cluster size ave.sil.width
1 1 232 0.67
2 2 128 0.25
cluster size ave.sil.width
1 1 240 0.68
2 2 120 0.24
cluster size ave.sil.width
1 1 239 0.67
2 2 121 0.22
cluster size ave.sil.width
1 1 239 0.67
2 2 121 0.22
cluster size ave.sil.width
1 1 229 0.67
2 2 131 0.19
cluster size ave.sil.width
1 1 241 0.68
2 2 119 0.29
Figure x: Silhouette plot guided by the best number of clusters. Observations with a large Si (almost 1) are very well clustered. A small Si (around 0) means that the observation lies between two clusters while a negative Si are probably placed in the wrong cluster.
Figure x: Scree plot of PCA. Shows which components explain most of the variability in the data. Over 80% of the variances contained in OTU and taxonomy data are retained by the first two principal components. The first PC explains the maximum amount of variation in the data set.
While PCA is based on Euclidean distances the PCoA is based on the (dis)similarity matrix calculated from OTU abundance data as described earlier. Literally, in any successful PCA or PCoA the first few axes are supposed to capture most of the variation in the input data. NMDS tries to substitute the original distance data with ranks. Unlike the PCA and PCoA the NMDS axes of the ordination are not ordered according to the variance they explain, instead a plot of stress values (a measure of goodness-of-fit) against dimensionality can be used to assess the proper choice of dimensions. Note that stress values >0.2 are generally considered hard to interpret, whereas values <0.1 are good and <0.05 are the better. In any case the inflexion point on scree plots and Shepard plots (stress plots) can be used to guide the selection of a minimum number of dimensions to use in the interpretation of the multidimensional data.
Figure x: Principal coordinate analysis ordination using Bray-Curtis dissimilarity matrix. Objects that are ordinated closer together have smaller dissimilarity values than those ordinated further apart. A successful PCoA will capture most of the variation in the (dis)similarity matrix in a few PCoA axes.
OTUs
----------------------------
Call:
metaMDS(comm = otu.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(otu.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1182664
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(otu.t[, -1]))'
Phylum
----------------------------
Call:
metaMDS(comm = phylum.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(phylum.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1201002
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(phylum.t[, -1]))'
Class
----------------------------
Call:
metaMDS(comm = class.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(class.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1402302
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(class.t[, -1]))'
Order
----------------------------
Call:
metaMDS(comm = order.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(order.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.143402
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(order.t[, -1]))'
Family
----------------------------
Call:
metaMDS(comm = family.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(family.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1343065
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(family.t[, -1]))'
Genus
----------------------------
Call:
metaMDS(comm = genus.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(genus.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.0001578126
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(genus.t[, -1]))'
Figure X. Sherperd and non-metric multidimensional scaling plot. Green oints represent samples and red points represent OTU or species. Similar samples are ordinated together. Stress values are shown at the botthom of ordination plot.
Figure x: Sample Phylip or Newick-formatted tree clustered using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm. Similar data was used to construct different types of tree including rectangular (A), circular (B) and unrooted (C) to view how samples were clustered.